home *** CD-ROM | disk | FTP | other *** search
-
- APPENDIX D.
-
- HYPERTEXT INFORMATION ACCESS STUDY
-
- INTERVIEW SUMMARY
- NEIL LARSON
- BERKELEY, CAL.
- MARCH 11, 1991
-
-
-
- A. HYPERTEXT ARCHIVE TRANSACTION/SUPPORT SYSTEM:
-
- A.1. Please summarize the basic hypertext document content
- assembly & maintenance procedures.
-
- The database consists of accounting and auditing
- information converted to a hypertext format for
- Deloitte & Touche. A Deloitte & Touche group has
- defined the strategic plan for the database. This
- includes content, maintenance, plans for expansion.
- Deloitte & Touche takes care of providing all material
- for the database.
-
- ** The information arrives almost entirely in hard
- copy printed form, and must be converted to electronic
- format. The current operation uses a Kurzweil OCR
- unit for conversion processing, and can convert
- approximately 500 pages per day, if needed, with two
- people working. One person handles physical
- processing of the OCR unit; the other handles spell-
- checking, error correction, and initial text
- formatting.
-
- ** Document inspection, analysis, and converting to
- screen format suitable for hypertext is the next step.
- The major task is breaking a linear document into
- separate hierarchical sections. This includes several
- subtasks. First, the screen format is adapted for
- best display and user comprehension. The document
- must be divided into multiple short sections,
- attempting to form logical text units or hypertext
- nodes covering a single topic, within a preferred
- maximum of one screen length. As source text is split
- up into hypertext nodes, the author embeds links to
- other relevant sections, and "continuity links" to
- previous and next sections.
-
- ** The author must also insert links which fit the
- total document into the system's overall conceptual
- hierarchy. He is continually revising and redefining
- that conceptual hierarchy.
-
- MaxThink hypertext links are designated in the form of
- the target MS-DOS filename surrounded by angle bracket
- characters. This link convention is case-insensitive.
- E.g., they can be in either following form:
- <filename>
- <FILENAME>.
-
-
- A.2. GENERAL PRESENTATION AND PRODUCTION DESIGN OF THE
- HYPERTEXT ACCESS SYSTEM:
-
- A.2.a. Describe the general arrangement of the main
- document file. (Unique document identification,
- general logical arrangement, basic principle of
- access)
-
- MaxThink elected usage of straight MS-DOS ASCII text
- files for basic node text storage. The text files are
- directly accessible by the combination of subdirectory
- name and file name.
-
- MS-DOS file retrieval performance seriously degrades
- if there are substantially more than 100 files in a
- disk subdirectory. They solved this limitation by
- using a system of hierarchical or specialized
- subdirectories, limiting each to approximately 100
- files. They use a general classification approach in
- assigning files to subdirectories. [NOTE:
- Subdirectory approach is also covered in the Phillips
- interview notes.]
-
- They use file-naming conventions to produce unique
- text filenames. These standardized names may reflect
- a combination of factors, such as source of
- information, document type, time/date of publication,
- source file section, etc. The conventions are
- generally mnemonic, so users easily learn the coding,
- and can predict file content.
-
- A.2.b. Please summarize the general concepts of the
- system's "user interface," the document access and
- display methods, design of the presentation means,
- etc.
-
- Larson says the design goal was to arrange information
- in a clear, simple, method, so that people can find
- it. They developed a hypertext presentation mechanism
- which they feel is intuitively obvious. They also
- attempted to design powerful hierarchy and indexing
- approaches, so the material would be accessible from
- many different viewpoints.
-
- The interface design is based almost entirely upon use
- of the four cursor arrow keys. The arrows are a
- metaphor for "jumping" to another location in the
- information base. The up and down cursor arrows
- select from links displayed on the screen; the right
- cursor arrow executes the jump; the left cursor arrow
- backtracks to the origin or "jump-off" location.
-
- Larson says, "This hypertext navigation metaphor is so
- simple that it takes a user about 30 seconds to learn.
- It is complemented by providing an effective
- hierarchical system of networked menus, in combination
- with an indexing system." Both approaches use
- "embedded menus." These feature obvious, eye-
- readable, hypertext links, used along with a clear and
- obvious menu structure, or descriptive surrounding
- text.
-
- He goes on, "Our menus attempt to build a conceptual
- structure of the topic. We use metaphors to express
- the thought patterns or structures relating to the
- topic. We intend to express the domain structure with
- such memorable, obvious, metaphors, that users will
- adopt the structure; that it becomes their structure."
-
-
- A.2.c. Identify and briefly describe the general
- production tools or building tools used in
- construction of the system.
-
- Larson describes their approach to hypertext
- construction as generally building the system out of
- nodes or fragments of information, which have been
- "decomposed" from original printed documents. He
- notes the necessity for identifying the information
- content in the nodes, and linking or sequencing them
- into a meaningful, communicative, knowledge structure.
-
- He describes the use three major tools for building
- the hypertext system. They include an editor, used
- for formatting and insertion of links; an outliner,
- used to form hypertext hierarchies; and a matrix
- outliner, or network builder, used to create complex
- hypertext networks. (More fully described in the next
- section.)
-
- He feels that these three tools give them the ability
- to construct three powerful and complementary
- approaches. He describes these as:
- * Taxonomic approach - using hierarchies
- * Linguistic approach - using the glossary index
- * Hypertext network - using the complex
- interconnected networks."
-
- He also mentions the use of various utility programs,
- described in next section.
-
-
- A.2.d. Identify and briefly describe the specialized
- organizational and quality control tools which allow
- you to build the system.
-
- ** "TransText" - the hypertext word processor. They feel
- this editor to be the most important tool. It is used
- for formatting, editing, "splitting up" or breaking
- the file into nodes, and for insertion of hypertext
- links. It thus handles both transformation of the
- file information into effective, communicative,
- display format, as well as the insertion of the links
- themselves.
-
- ** "MaxThink" - outliner, used as the major
- hierarchical tool. It can create classes, sequence,
- boundaries, and hierarchies (with inheritance). It is
- used to create logical structures or metaphors of the
- information domain, which can automatically generate
- hypertext hierarchies.
-
- ** "Houdini" - network-building tool. This program
- is a matrix outliner, and can build "3-dimensional"
- outlines, where any node can be connected to any other
- node. These networks can also interconnect to and
- within other networks. Again, the Houdini matrix
- networks can automatically generate hypertext
- networks. The network headings also generate a KWOC
- "glossary" index, which is always instantly available
- to the user.
-
- Larson pointed out that they also use a number of
- specialized utility programs, for specialized editing
- and control functions. Some examples are:
-
- ** REFALL - shows all hypertext jumps FROM a file.
- Good for analyzing patterns of hypertext linkage.
- ** INVERT - shows all hypertext jumps TO a file.
- Good for analyzing patterns of hypertext linkage.
- ** CONNECT - shows all generations of input and
- output links to a group of specified files. Good for
- analyzing patterns of hypertext linkage.
- ** LINE - creates a <linked> list of all hypertext
- source text nodes, including title line or descriptive
- first line of text. List can be used with the
- TransText editor, or imported into the MaxThink
- outliner or Houdini matrix outliner. Good for
- identification and network incorporation of text
- content nodes.
- ** IC - (Integrity Checker) used to check for blind
- references to non-existent files, for link name
- errors.
- ** Glossary building utilities - produces an "online
- index" to network nodes and file titles, presented in
- KWOC format. Exercises depluralization, synonym
- control, and sorting of index entries by source
- document type.
-
-
-
- B. THE HYPERTEXT INFORMATION ACCESS SYSTEM:
-
- B.1. ACCESS POINTS - Which of the following types of
- access points are included in your system?
- For each question item, please rate using the following
- categories, and comments as needed...
-
- P)resent,E)asily achievable,M)odifications needed,N)ot
- achievable
-
- B.1.a. Main file sequence - direct file access
- Category: [P] E M N
-
- Hypertext nodes retrievable by ASCII file name.
-
- B.1.b. Author
- Category: [P] E M N
-
- Editorial decision. Author indexing is included in
- DaTa, in many instances.
-
- B.1.c. Title
- Category: [P] E M N
-
- Editorial decision. Included in DaTa, in many cases.
-
- B.1.d. Name forms
- Category: [P] E M N
-
- Editorial decision. Optionally included.
-
- B.1.d.i. Personal names
- Category: [P] E M N
-
- Editorial decision. Optionally included.
-
- B.1.d.ii. Corporate names (Companies, organizations,
- government, etc.)
- Category: [P] E M N
-
- Editorial decision. Optionally included.
-
- B.1.e. Keywords
- Category: [P] E M N
-
- Keyword access through "Glossary" KWOC index.
-
- B.1.f. Subject/Topic/Concept
- Category: [P] E M N
-
- Via hierarchy, network, and KWOC index.
-
- B.1.g. Geographic
- Category: [P] E M N
-
- Editorial decision. Optionally included. Present in
- DaTa as part of hierarchy.
-
- B.1.h. Date, chronological, temporal
- Category: [P] E M N
-
- Editorial decision. Optionally included. Present in
- DaTa as part of hierarchy, as well as in filename
- conventions.
-
- B.1.i. Language
- Category: [P] E M N
-
- This is purely an editorial decision, the capability
- is present. Minor software modifications may be
- needed, to handle ASCII extended character set for
- foreign languages.
-
- B.1.j. Document format - book, article, pamphlet, report,
- etc.
- Category: P [E] M N
-
- Editorial decision. Optionally included.
-
- B.1.k. Document position - section, page, location
- Category: [P] E M N
-
- Editorial decision. Can optionally be included as
- part of hierarchy. This would be labor-intensive. It
- would be most efficient to add this as a link call to
- an external searching program, with the ability to
- handle positional or string specifications.
-
- B.1.l. Automated field specifications - record size, entry
- date, notations, originator, etc.
- Category: [P] E M N
-
- MaxThink utilities include a string-searching program,
- callable from embedded hypertext link. The hypertext
- links can similarly call any external DOS program.
- [The investigator, for example, has built a system
- with link calls to the Zyindex text search & retrieval
- program. The Zyindex index file allowed full-text
- search of the entire hypertext database, in addition
- to regular hypertext links.]
- B.2 ACCESS APPROACHES - Which of the following subject or
- topical information devices are used in your system?
- For each question item, please rate using the following
- categories, and comments as needed...
-
- P)resent,E)asily achievable,M)odifications needed,N)ot
- achievable
-
-
- B.2.a. Classification schemes
-
- B.2.a.i. Hierarchical taxonomy
- Category: [P] E M N
-
- Yes, we view the generated hierarchy and linked
- network as a classification scheme, more flexible and
- powerful than the standard linear taxonomy.
-
- B.2.a.ii. Enumerative, universal, classification [Dewey
- type classification]
- Category: P [E] M N
-
- Editorial decision. Optionally included. Any
- classification can be embedded or expressed in the
- hypertext hierarchy.
-
- B.2.a.iii. Specialized, literary warrant, classification
- [Library of Congress, Reader Interest Classification]
- Category: P [E] M N
-
- Editorial decision. Optionally included. Any
- classification can be embedded or expressed in the
- hypertext hierarchy.
-
- B.2.a.iv. Faceted classification (analytico-synthetic)
- [PRECIS style of indexing] [C., p.65]
- Category: P [E] M N
-
- Editorial decision. Optionally included. Any
- classification can be embedded or expressed in the
- hypertext hierarchy.
-
- B.2.b. Indexing approaches
-
- B.2.b.i. Alphabetical index, separate or dictionary file
- Category: [P] E M N
-
- Present.
-
- B.2.b.i.A. Keywords, extracted or assigned
- Category: [P] E M N
-
- Have utilities for term extraction, will be developing
- further. the KWOC index utility rotates assigned
- network headings or file title words.
-
- B.2.b.i.B. Controlled vocabulary assignment
- Category: P [E] M N
-
- Editorial decision. Optionally included. At present,
- the KWOC index utility optionally rotates either
- assigned network headings or file title phrases.
-
- B.2.b.i.C Relative index, e.g., to Dewey classification
- Category: P [E] M N
-
- Editorial decision. Optionally included via taxonomy.
-
- B.2.b.ii. Term manipulation indexes (generally for
- production of printed output)
- Category: [P] E M N
-
- An integral part of the system.
-
- B.2.b.ii.A. Simple permuted or rotated - KWIC
- Category: P [E] M N
-
- Editorial decision. Optionally included.
-
- B.2.b.ii.B. Ordered by extracted element - KWOC
- Category: [P] E M N
-
- An integral part of the system.
-
- B.2.b.ii.C. String indexing (phrase-manipulation, rotation
- of terms) - PRECIS, NEPHIS, etc.
- Category: P [E] M N
-
- Editorial decision. Optionally included. Achievable
- by creating index with external utility, then
- importing into taxonomy form.
-
- B.2.b.ii.D. Chain indexing (string indexing, with forms
- reflecting basic taxonomy of terms [C., p. 67]
- Category: P [E] M N
-
- Editorial decision. Optionally included. Achievable
- by creating index with external utility, then
- importing into taxonomy form.
-
- B.2.b.iii. Classified index (generally requires secondary
- alphabetical index, for ease of use) [C., p. 56]
- Category: P [E] M N
-
- Editorial decision. Optionally included. Achievable
- by creating index with external utility, then
- importing into taxonomy form.
-
- B.2.b.iv. Coordinate indexing - Manual coordination or
- automated database file, using Boolean search [C., p.
- 60]
- Category: P [E] M N
-
- Editorial decision. Optionally included. Achievable
- by call to external program.
-
- B.2.b.iv.A. Older non-automated searching methods -
- peekaboo, edge-notched cards, Uniterm terminal digit
- cards
- Category: P E M [N]
-
- Not applicable. This system does not use a hard copy
- format file record.
-
- B.2.b.iv.B. Database file search - Sequential or indexed
- field search
- Category: P [E] M N
-
- Editorial decision. Optionally included. Achievable
- by call to external program.
- B.2.b.iv.C. Full text search
- Category: [P] E M N
-
- During the interview, and elsewhere, Larson voices
- strong subjective disapproval of this information
- retrieval approach (Fersko-Weiss 1991). Nevertheless,
- MaxThink provides SEARCH and CD-INDEX, two program
- modules which provide this option. This is an
- editorial decision; the text-searching feature may be
- optionally included. The hypertext links can also
- call other, more powerful, string-searching programs.
-
- An example is National Legal Research Systems' Qwik-
- Rules (TM) legal rules hypertext information system.
- They used MaxThink hypertext software to build the
- system, and provide links to QWIKFIND, their own text-
- searching engine. As elsewhere mentioned, the
- investigator himself has also built systems with link
- calls to Zyindex, Golden Retriever, Power Search, and
- other text-searching programs.
-
- B.2.b.v. Faceted indexing [C., p 65]
- Category: P [E] M N
-
- Editorial decision. Optionally included. Achievable
- by creating index with external utility, then
- importing into taxonomy form.
-
- B.2.b.vi. Citation indexing [C., p. 72]
- Category: P [E] M N
-
- Editorial decision. Optionally included. Achievable
- by creating index with external utility, then
- importing into taxonomy form.
- B.3. CONTROL MECHANISMS - Which of the following subject
- access control measures, intended to control
- consistency, form, and item sequencing, are present in
- your system?
- For each question item, please rate using the following
- categories, and comments as needed...
-
- P)resent,E)asily achievable,M)odifications needed,N)ot
- achievable
-
-
- B.3.a. Classification schedule
- Category: [P] E M N
-
- The hierarchical taxonomy is equivalent to a flexible
- classification schedule, in our opinion.
-
- B.3.b. Vocabulary control systems
- Category: [P] E M N
-
- Editorial decision. Optionally included. Our
- Glossary utility presently uses controls on form of
- entry, e.g., depluralization, (singular preferred),
- synonym cross-references, stopword lists for the KWOC
- index, automatic sorting by entry type. We are also
- considering automatic word-stemming for the KWOC
- index.
-
- B.3.b.i. Authority/Headings files
- Category: P [E] M N
-
- Editorial decision. Optionally included. Achievable
- by external manual or automated means.
-
- B.3.b.ii. Thesaurus control
- Category: P [E] M N
-
- Editorial decision. Optionally included. Achievable
- by external manual or automated means.
-
- B.3.b.iii. Derived-term methods or algorithms
- Category: P [E] M N
-
- The DaTa operation already uses term extraction
- utilities for analyzing files and groups of files.
- MaxThink is considering developing more advanced term
- extraction utilities, based on word frequency, per
- Miranda Pao. This could also be achieved by using
- third-party software for index term extraction.
-
- B.3.b.iv. Hierarchical search thesaurus (for database file
- search)
- Category: P E M [N]
-
- This approach is not currently used, nor realistic,
- since the primary approach is not a "searching"
- methodology. If editorial decision mandates, authors
- could achieve this via link call to external searching
- program with this capability. E.g., Zyindex,
- MicroBASIS.
-
- B.3.b.v. Entry term form control mechanisms
- Category: [P] E M N
-
- Editorial decision. Optionally included. Achievable
- externally, using manual or automated means.
-
-
- B.3.b.v.A. Entry syntax (preferred noun/adjective, etc.,
- construction form)
- Category: [P] E M N
-
- Present approach entirely a matter of editorial policy
- control. E.g., the DaTa CD-ROM product operates with
- preferred usages.
-
- B.3.b.v.B. Standard number approach (plural, singular
- form preference)
- Category: [P] E M N
-
- Present DaTa approach uses singular-preferred, uses
- depluralization in the glossary KWOC utility.
-
- B.3.b.v.C. Automatic depluralization (database file)
- Category: P [E] M N
-
- Not applicable using the associative linking approach.
- Depluralization can be implemented in hypertext index
- representations. The present DaTa approach uses
- singular-preferred, uses depluralization in the
- glossary KWOC utility. As an alternative, an author
- can also use links to external database software with
- this capability
-
- B.3.b.v.D. Synonym definition (database file)
- Category: [P] E M N
-
- This is an editorial decision. The KWOC glossary
- utility program includes automatic synonym handling,
- cross-references, etc., for construction of the KWOC
- index.
-
- B.3.c. "Standard Subdivision" or faceted classification
- protocol
- Category: [P] E M N
-
- Use standard extensions in filename conventions for
- document types; also use standard coding to reflect
- document types in network/glossary files. This also
- results in sorting by document or node type in the
- KWOC index.
-
- B.3.d. Term or descriptor relationships - Roles, links,
- weighting
- Category: P [E] M N
-
- Not currently used, nor realistic, since the primary
- approach is not a "searching" methodology. If
- editorial decision mandated, could achieve by link
- call to external searching programs with this
- capability.
-
- B.3.e. Filing or sorting rules
- Category: [P] E M N
-
- For convenience, they currently use straight ASCII
- sort for the KWOC index, with sub-sorts by document or
- node type. The network taxonomy certainly reflects a
- subjective, author-imposed, ordering or hierarchy.
-
- Any other sorting sequence for the KWOC could be
- supported with the correct algorithm for the external
- sorting utility.
-
- B.3.f. Manual or automated authority/procedural safety
- measures
- Category: [P] E M N
-
- Full set of utilities, described above, for checking
- linking patterns, clustering, link name spelling
- errors, blind references, file text contents, etc.
-
- In addition, the production team uses full normal
- computer operating approaches to backup files, off-
- site copies, working copy backups, etc.